An Expectation-Maximization Algorithm Working on Data Summary
نویسندگان
چکیده
Scalable cluster analysis addresses the problem of processing large data sets with limited resources, e.g., memory and computation time. A data summarization or sampling procedure is an essential step of most scalable algorithms. It forms a compact representation of the data. Based on it, traditional clustering algorithms can process large data sets efficiently. However, there is little work on how to effectively make cluster analysis on data summaries. From the principle of the general expectation-maximization algorithm, we propose a model-based clustering algorithm to make better use of these data summaries in this paper. The proposed EMACF (Expectation-Maximization Algorithm on Clustering Features) algorithm employs such data summary features as weight, mean, and variance explicitly. It is proved that EMACF converges to a local maximum likelihood value. EMACF is linear with the number of data summaries instead of data items, and thus can be integrated with any efficient data summarization procedure to construct a scalable clustering algorithm.
منابع مشابه
The Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways
The performance of many traffic control strategies depends on how much the traffic flow models have been accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive ...
متن کاملScaling-Up Model-Based Clustering Algorithm by Working on Clustering Features
In this paper, we propose EMACF (Expectation-Maximization Algorithm for Clustering Features) to generate clusters from data summaries rather than data items directly. Incorporating with an adaptive grid-based data summarization procedure, we establish a scalable clustering algorithm: gEMACF. The experimental results show that gEMACF can generate more accurate results than other scalable cluster...
متن کاملQuantitative SPECT and planar 32P bremsstrahlung imaging for dosimetry purpose –An experimental phantom study
Background: In this study, Quantitative 32P bremsstrahlung planar and SPECT imaging and consequent dose assessment were carried out as a comprehensive phantom study to define an appropriate method for accurate Dosimetry in clinical practice. Materials and Methods: CT, planar and SPECT bremsstrahlung images of Jaszczak phantom containing a known activity of 32P were acquired. In addition, Phanto...
متن کاملThe Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways
The performance of many traffic control strategies depends on how much the traffic flow models are accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive loop d...
متن کاملThe Expectatio Maximization Algorithm
common task in signal processing is the estimation of the parameters of a probability distribution function on. Perhaps the most frequently encountered estimation problem is the estimation of the mean of a signal in noise. In many parameter estimation problems the situation is more complicated because direct access to the data necessary to estimate the parameters is impossible, or some of the d...
متن کامل